Causal Inference Methods for Observational Data

Zach Dickson

Fellow in Quantitative Methodology
London School of Economics

Schedule

  • Potential Outcomes Framework
    • The Experimental Ideal
    • Counterfactual reasoning
    • The Fundamental Problem of Causal Inference

10 minute break

  • Directed Acyclic Graphs (DAGs)
    • Confounding
    • Selection Bias

1 hour lunch break

  • Causal Identification in Observational Studies
    • Methods for estimating causal effects
    • Assumptions required for causal identification

10 minute break

  • Applied Example with DiD
    • Estimating the ATT
    • Interpreting the Results

My Background & Research Interests

Correlation vs. Causation

Trulli

Correlation vs. Causation

  • Correlation
    • A statistical measure that describes the extent to which two variables change together
    • Does not imply causation
  • Causation
    • A relationship between two variables where one variable causes the other to change
    • Establishing causation requires careful study and analysis

What is Causal Inference?

  • Goal: Estimate the causal effect of a treatment, policy, or intervention.
  • Key Question: What would have happened if things had been different?
  • Example: Does a new economic policy increase employment?

Counterfactual Thinking

  • Causal effects are defined in terms of potential outcomes.
  • The counterfactual is what would have happened in an alternative scenario.
  • Example:
    • You take a job training program → You get a higher salary.
    • Counterfactual: What would your salary have been if you had not taken the program?

Potential Outcomes Framework

  • Define treatment T:

    • T = 1 (treated)
    • T = 0 (control)
  • Define potential outcomes:

    • \(Y(1)\) = Outcome if treated
    • \(Y(0)\) = Outcome if not treated
  • Causal Effect:

    \[ ATE = Y(1) - Y(0) \]

So what’s the problem?


\[ ATE = Y(1) - Y(0) \]

The Fundamental Problem of Causal Inference

  • We never observe both \(Y(1)\) and \(Y(0)\) for the same unit.
  • This makes causal inference challenging.
  • Solution: Use design or statistical methods to approximate the counterfactual.
  • The better we can approximate the counterfactual, the more accurate our causal estimates will be.

Observed vs. Missing Data

Unit Treated? (T) Observed Outcome Counterfactual Outcome
A 1 \(Y_A(1)\) \(Y_A(0)\) (Missing)
B 0 \(Y_B(0)\) \(Y_B(1)\) (Missing)
  • Key challenge: Estimating missing potential outcomes.

How can we approximate the counterfactual?

  • If we want to know what would happen if a unit was treated, we can compare it to similar units that were not treated.
  • Assumption: Units are similar in all respects except for the treatment.
  • Challenge: How do we know if units are similar?

The Experimental Ideal

Let’s think about the ideal experiment:

  • Randomized Experiments: Gold standard for causal inference.
  • When we are in control of the world, we can randomly assign treatment.
    • Random Assignment: Ensures that treatment is independent of potential outcomes.
    • Counterfactuals: Each unit has a well-defined counterfactual.
    • Causal Effects: Can be estimated without bias.

The Randomized Controlled Trial (RCTs)

In a randomized controlled trial (RCT):

  • Units are randomly assigned to treatment or control.
  • The treatment group receives the treatment, the control group does not.
  • The average treatment effect (ATE) is the difference in outcomes between the two groups.
  • \(ATE = E[Y(1) - Y(0)]\)

RCTs are not always feasible

  • Ethical Concerns: Not all research questions can be answered with experiments.
  • Practical Concerns: Experiments can be expensive, time-consuming, or infeasible.
  • External Validity: Findings from experiments may not generalize to other contexts.

Observational Studies

  • Observational Studies: Research designs where treatment is not randomly assigned.

    • Example:
      • Treatment: Job training program
      • Outcome: Salary
      • Problem: People who choose to take the program may be different from those who don’t.
  • Key Challenge: Estimating causal effects without bias.

Approximating the Experimental Ideal

  • Using what we know about the world, we can try to approximate the counterfactual.
  • Assumption: Units are similar in all respects except for the treatment.
  • Methods:
    • Matching
    • Difference-in-Differences
    • Instrumental Variables
    • Regression Discontinuity.
  • We’ll explore these in later sessions!

Summary

  • Causal inference helps us answer “what if” questions about the world
  • The potential outcomes framework defines causal effects.
  • The fundamental problem of causal inference is that we never observe both potential outcomes.
  • Next session: Directed Acyclic Graphs & Bias.

10 minute break

📌 Questions? Feel free to ask!

Session 2: Directed Acyclic Graphs (DAGs)

Directed Acyclic Graphs (DAGs)

  • Directed Acyclic Graphs (DAGs):
    • A visual tool for representing causal relationships.
    • Nodes represent variables, edges represent causal relationships.
    • Acyclic: No feedback loops or causal chains.

Directed Acyclic Graphs (DAGs)

  • Bias in [observational] studies:
    • Confounding
    • Selection Bias/Endogeneity
  • DAGs help identify these sources of bias.

DAGs

source: Brophy (2021)

Confounding

  • Definition: Confounding occurs when a third variable (a confounder) influences both the treatment and the outcome, creating a spurious association.

  • Problem: Confounders are pre-treatment variables that affect both treatment assignment and the outcome.

  • Effect: Confounding distorts the causal effect because it makes it unclear whether the observed effect is due to the treatment or the confounding variable.

  • Confounding is not based on statistical associations, but on qualitative knowledge of the data.

Confounding Example

RQ: Does job training increase salary?

  • Example:
    • Treatment: Job training program
    • Outcome: Salary
    • Confounder: Motivation
  • Problem: The treatment and outcome are correlated because of the confounder.

Confounding DAG

Selection Bias/Endogeneity

  • Definition: Selection bias occurs when the sample used in the analysis is not representative of the population due to a non-random selection process

  • Problem: Arises when the probability of being included in the study depends on the treatment, the outcome, or factors related to both.

  • Effect: Creates spurious associations between the treatment and outcome that do not reflect the true causal effect.

Selection Bias Example

  • Example:
    • Treatment: Job Training Program
    • Outcome: Employment
    • Selection Bias: Some people complete the training, some don’t
  • Conditioning on a collider induces a spurious association between the treatment and outcome.

Another Selection Bias Example

  • Example:
    • Treatment: Going to the doctor
    • Outcome: Health
    • Selection Bias: Only sick people go to the doctor
  • Conditioning on whether someone goes to the doctor induces a spurious association between going to the doctor and health.

Other forms of selection bias

  • Survival Bias: Only observing units that survive until the end of the study.
  • Non-response Bias: Only observing units that respond to the survey.
  • Publication Bias: Only observing studies that report significant results.
  • Attrition Bias: Only observing units that remain in the study.

Addressing Confounding

  • We can condition on confounding variables to block backdoor paths.
  • Conditioning: Adjusting for confounders in the analysis.
    • e.g. ‘controlling for’ or ‘including’ variables in a regression model.
  • We can’t really ‘control’ for all confounders, but we can try to identify and adjust for the most important ones.

Addressing Confounding

Trulli

Addressing Selection Bias

  • This is more challenging than confounding because it is not based on pre-treatment variables

  • Solutions:

    • Randomized Experiments: Random assignment ensures that treatment is independent of potential outcomes.
    • Instrumental Variables: Use an instrument to estimate the causal effect.

Activity

  • Identify a research question of interest.
    • If you can’t think of one, ask me (or chatGPT) for suggestions!
  • Create a DAG that represents the causal relationships.
  • Identify potential sources of confounding or selection bias.
  • Think about how you would address these issues in your analysis.

Bonus: Create a DAG in R using the dagitty package and visualize it using ggdag.

library(dagitty)
library(ggplot2)
library(ggdag)

dag <- dagitty( 'dag {
  X -> Y
  Z -> X
  Z -> Y
}' )

ggdag(dag, text_size = 10, node_size = 14) + theme_dag() +
  labs(caption = "X = Job Training\nY = Salary\nZ = Motivation")

Summary

  • Directed Acyclic Graphs (DAGs):
    • Visual tool for representing causal relationships.
    • Help identify sources of bias in observational studies.
  • Confounding:
    • A third variable influences both the treatment and outcome.
    • Distorts the causal effect.
  • Selection Bias/Endogeneity:
    • Non-random selection process creates spurious associations.
    • More challenging to address than confounding.

More on DAGs

  • Resources:
    • The Effect: An Introduction to Research Design and Causality by Nick Huntington-Klein
    • (Mostly Clinical) Epidemiology with R by James Brophy
    • DAGitty: Online tool for creating and analyzing DAGs
    • ggdag: R package for visualizing DAGs

Lunch Break

  • Time: 1 hour

Session 3: Causal Identification in Observational Studies

Causal Identification in Observational Studies

  • Causal Identification: Estimating causal effects in observational studies.
  • Common Methods:
    • Difference-in-Differences (DiD)
    • Regression Discontinuity (RD)
    • Instrumental Variables (IV)
  • Goal: Estimate causal effects without bias.

Common Estimands in Causal Inference

  • ATE (Average Treatment Effect):
    • The average causal effect of treatment on the outcome.
    • \(ATE = E[Y(1) - Y(0)]\)
  • ATT (Average Treatment Effect on the Treated):
    • The average causal effect of treatment on the treated units.
    • \(ATT = E[Y(1) - Y(0) | T = 1]\)
  • LATE (Local Average Treatment Effect):
    • The causal effect of treatment for compliers in an IV setting.
    • \(LATE = E[Y(1) - Y(0) | D = 1]\)
  • CATE (Conditional Average Treatment Effect):
    • The causal effect of treatment for a specific subgroup of units.
    • \(CATE = E[Y(1) - Y(0) | X]\)

The Experimental Ideal (again)

  • Randomized Experiments (RCTs):
    • Gold standard for causal inference.
    • Random assignment ensures treatment is independent of potential outcomes.
    • Causal effects can be estimated without bias.
    • Estimates the ATE.
  • We want to keep returning to this ideal when designing observational studies.

The toolkit for causal inference in observational studies

  • Difference-in-Differences (DiD)
  • Regression Discontinuity (RD)
  • Instrumental Variables (IV)

Difference-in-Differences (DiD)

  • Workhorse of causal inference in economics and social sciences.
  • Idea: Compare changes in outcomes over time between treated and control groups.
  • Estimand: The average treatment effect of the treatment on the treated.
Trulli

Difference-in-Differences (DiD)

  • Assumptions:
    • Parallel Trends: Treated and control groups have parallel trends in the absence of treatment.
    • Common Shocks: No unobserved shocks that affect treated and control groups differently.
    • Stable Treatment: Treatment does not change over time.
    • No Spillovers: Treatment does not affect control units.
    • No Anticipation: Units do not anticipate the treatment.

DID Example

Does a new training programme increase employment rates?

The UK Government introduces a training programme to increase employment. The programme is rolled out to local authorities at different times. We want to estimate the causal effect of the programme on employment rates.

DID Example

  • Treatment:
    • Local authorities that receive the training programme.
  • Control:
    • Local authorities that do not receive the training programme.
  • Outcome:
    • Employment rates before and after the programme is introduced.

DID Example

  • Assumptions:
    • Parallel Trends: Employment rates would have followed the same trend in treated and control groups in the absence of the programme.
    • Common Shocks: No unobserved shocks (other than the treatment) that affect treated and control groups differently.
    • Stable Treatment: The programme does not change over time.
    • No Spillovers: The programme does not affect control units.
    • No Anticipation: Local authorities do not anticipate the programme.

Regression Discontinuity (RD)

Many policies assign treatment based on a threshold. RD estimates causal effects by comparing outcomes just above and below the threshold.

  • Idea: Compare outcomes for units just above and below a [arbitrary] threshold.
  • Key Assumption: Units just above and below the threshold are similar in all other respects.
  • Estimand: Usually the LATE (Local Average Treatment Effect).

Regression Discontinuity (RD)

Trulli

Regression Discontinuity (RD)

  • Assumptions:
    • Common Trend: Units just above and below the threshold would have followed the same trend in the absence of the threshold.
    • No Manipulation: Units cannot manipulate the threshold to change treatment assignment.
    • No Spillovers: Treatment does not affect units on the other side of the threshold.

RD Example

Does the size of a seminar group affect student performance?

The maximum number of students per seminar group at LSE is 30. We want to know whether seminar group size affects student performance. We compare student performance just above and below the threshold.

RD Example

  • Treatment:
    • Seminar groups with more than 30 students.
  • Control:
    • Seminar groups with 30 or fewer students.
  • Outcome:
    • Student performance just above and below the threshold.

RD Example

  • Assumptions:
    • Common Trend: Student performance would have followed the same trend in large and small seminar groups in the absence of the threshold.
    • No Manipulation: Students cannot manipulate seminar group size to change treatment assignment.
    • No Spillovers: Seminar group size does not affect students in other groups.

Instrumental Variable (IV)

  • Idea: Use an instrument to estimate the causal effect of treatment.
  • Instrument: A variable that affects treatment but is unrelated to the outcome.
  • Estimand: LATE (Local Average Treatment Effect) for compliers.

Instrumental Variable (IV)


Trulli

Instrumental Variable (IV)

  • Assumptions:
    • Relevance: The instrument affects treatment.
    • Exogeneity: The instrument is unrelated to the outcome except through the treatment.
    • Exclusion: The instrument does not affect the outcome except through the treatment.

IV Example

Does college education increase earnings?

There are many reasons why people choose to go to college, such as ability, motivation, and family background. One factor that may affect college attendance is proximity to a college. We can use proximity to a college as an instrument for college attendance (Card 1993)

IV Example

  • Treatment:
    • College attendance.
  • Instrument:
    • Proximity to a college.
  • Outcome:
    • Earnings.

IV Example

  • Assumptions:
    • Relevance: Proximity to a college affects college attendance.
    • Exogeneity: Proximity to a college is unrelated to earnings except through college attendance.
    • Exclusion: Proximity to a college does not affect earnings except through college attendance.

Can we think of any issues with this IV example?

  • How might proximity to a college be related to earnings?
    • How might being close to a college affect earnings through other channels?
    • How might the exclusion restriction be violated?

Summary

  • Causal Identification: Estimating causal effects in observational studies.
  • Methods:
    • Difference-in-Differences (DiD): Compare changes in outcomes over time between treated and control groups.
    • Regression Discontinuity (RD): Compare outcomes just above and below a threshold.
    • Instrumental Variables (IV): Use an instrument to estimate the causal effect.
  • Assumptions: Each method relies on specific assumptions to identify causal effects.

10 Minute Break

Session 4: Estimating Causal Effects in Observational Studies

Full Example:

  • Difference-in-Differences (DiD)
    • Assumptions required for causal identification
    • Estimating the causal effect
    • Interpreting the results

Difference-in-Differences (DiD) recap

  • Idea: Compare changes in outcomes over time between treated and control groups.
  • Assumptions:
    • Parallel Trends: Treated and control groups have parallel trends in the absence of treatment.
    • Common Shocks: No unobserved shocks that affect treated and control groups differently.
    • Stable Treatment: Treatment does not change over time.
    • No Spillovers: Treatment does not affect control units.
    • No Anticipation: Units do not anticipate the treatment.

DiD Example

In April 2020 during the height of the COVID pandemic, Donald Trump sent three messages on Twitter calling for the “liberation” of three specific states under lockdown. We want to estimate the causal effect of these tweets on social distancing behavior.

DiD Example

  • Treatment:
    • States that Trump targeted.
  • Control:
    • States that were not targeted.
  • Outcome:
    • Social distancing behavior before and after the tweets.

DiD Example Assumptions

What assumptions do we need to identify the causal effect of Trump’s tweets on social distancing behavior?

  • Parallel Trends: Social distancing behavior would have followed the same trend in treated and control states in the absence of the tweets.
  • Common Shocks: No unobserved shocks (other than the tweets) that affect treated and control states differently.
  • No Spillovers: The tweets do not affect control states.
  • No Anticipation: States do not anticipate the tweets.

What are potential threats to these assumptions?

  • Common Shocks: What other events might have affected social distancing behavior?
  • No Spillovers: How might the tweets affect control states?

Estimating the Causal Effect

  • DiD Estimator:
    • \[Y_{it} = \alpha_i + \lambda_t + \delta (\text{treated}_i * \text{post}_t) + \epsilon_{it}\]

where:

  • \(Y_{it}\) = Mobility for state \(i\) on day \(t\)
  • \(\alpha_i\) = state fixed effects
  • \(lambda_t\) = day fixed effects
  • \(\delta\) = estimated causal effect of the treatment
  • \(\text{treated}_i\) = indicator for treated states
  • \(\text{post}_t\) = indicator for post-treatment period
  • \(\epsilon_{it}\) = error term

Getting the data

Github Repository

Github Repository: Causal Inference Methods for Observational Data

Let’s switch to R

This notebook is available in the Github repository Code folder titled DID_notebook.qmd

Note: You will need to download the .qmd file and open it in RStudio to run the code.

Summary

  • Difference-in-Differences (DiD):
    • Compare changes in outcomes over time between treated and control groups.
    • Assumptions: Parallel Trends, Common Shocks, Stable Treatment, No Spillovers, No Anticipation.
    • Estimating the causal effect: DiD estimator.

Summary of the Day

  • Potential Outcomes Framework:
    • Defines causal effects in terms of potential outcomes.
    • The fundamental problem of causal inference.
  • Directed Acyclic Graphs (DAGs):
    • Visual tool for representing causal relationships.
    • Identify sources of bias in observational studies.
  • Causal Identification in Observational Studies:
    • Methods for estimating causal effects without bias.
    • Difference-in-Differences, Regression Discontinuity, Instrumental Variables.
  • Estimating Causal Effects in Observational Studies:
    • A lot more challenging than in experiments.
    • Requires careful attention to assumptions and methods.

Additional Resources

Questions?

  • Feel free to ask any questions you have about the material we’ve covered today.

Thank you!

Reach out if you have further questions - z.dickson@lse.ac.uk

References

Card, David. 1993. “Using Geographic Variation in College Proximity to Estimate the Return to Schooling.” National Bureau of Economic Research Cambridge, Mass., USA.